120 research outputs found
Forest Garrote
Variable selection for high-dimensional linear models has received a lot of
attention lately, mostly in the context of l1-regularization. Part of the
attraction is the variable selection effect: parsimonious models are obtained,
which are very suitable for interpretation. In terms of predictive power,
however, these regularized linear models are often slightly inferior to machine
learning procedures like tree ensembles. Tree ensembles, on the other hand,
lack usually a formal way of variable selection and are difficult to visualize.
A Garrote-style convex penalty for trees ensembles, in particular Random
Forests, is proposed. The penalty selects functional groups of nodes in the
trees. These could be as simple as monotone functions of individual predictor
variables. This yields a parsimonious function fit, which lends itself easily
to visualization and interpretation. The predictive power is maintained at
least at the same level as the original tree ensemble. A key feature of the
method is that, once a tree ensemble is fitted, no further tuning parameter
needs to be selected. The empirical performance is demonstrated on a wide array
of datasets.Comment: 16 pages, 3 figure
Node harvest
When choosing a suitable technique for regression and classification with
multivariate predictor variables, one is often faced with a tradeoff between
interpretability and high predictive accuracy. To give a classical example,
classification and regression trees are easy to understand and interpret. Tree
ensembles like Random Forests provide usually more accurate predictions. Yet
tree ensembles are also more difficult to analyze than single trees and are
often criticized, perhaps unfairly, as `black box' predictors. Node harvest is
trying to reconcile the two aims of interpretability and predictive accuracy by
combining positive aspects of trees and tree ensembles. Results are very sparse
and interpretable and predictive accuracy is extremely competitive, especially
for low signal-to-noise data. The procedure is simple: an initial set of a few
thousand nodes is generated randomly. If a new observation falls into just a
single node, its prediction is the mean response of all training observation
within this node, identical to a tree-like prediction. A new observation falls
typically into several nodes and its prediction is then the weighted average of
the mean responses across all these nodes. The only role of node harvest is to
`pick' the right nodes from the initial large ensemble of nodes by choosing
node weights, which amounts in the proposed algorithm to a quadratic
programming problem with linear inequality constraints. The solution is sparse
in the sense that only very few nodes are selected with a nonzero weight. This
sparsity is not explicitly enforced. Maybe surprisingly, it is not necessary to
select a tuning parameter for optimal predictive accuracy. Node harvest can
handle mixed data and missing values and is shown to be simple to interpret and
competitive in predictive accuracy on a variety of data sets.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS367 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
High-dimensional graphs and variable selection with the Lasso
The pattern of zero entries in the inverse covariance matrix of a
multivariate normal distribution corresponds to conditional independence
restrictions between variables. Covariance selection aims at estimating those
structural zeros from data. We show that neighborhood selection with the Lasso
is a computationally attractive alternative to standard covariance selection
for sparse high-dimensional graphs. Neighborhood selection estimates the
conditional independence restrictions separately for each node in the graph and
is hence equivalent to variable selection for Gaussian linear models. We show
that the proposed neighborhood selection scheme is consistent for sparse
high-dimensional graphs. Consistency hinges on the choice of the penalty
parameter. The oracle value for optimal prediction does not lead to a
consistent neighborhood estimate. Controlling instead the probability of
falsely joining some distinct connectivity components of the graph, consistent
estimation for sparse graphs is achieved (with exponential rates), even when
the number of variables grows as the number of observations raised to an
arbitrary power.Comment: Published at http://dx.doi.org/10.1214/009053606000000281 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
LASSO ISOtone for High Dimensional Additive Isotonic Regression
Additive isotonic regression attempts to determine the relationship between a
multi-dimensional observation variable and a response, under the constraint
that the estimate is the additive sum of univariate component effects that are
monotonically increasing. In this article, we present a new method for such
regression called LASSO Isotone (LISO). LISO adapts ideas from sparse linear
modelling to additive isotonic regression. Thus, it is viable in many
situations with high dimensional predictor variables, where selection of
significant versus insignificant variables are required. We suggest an
algorithm involving a modification of the backfitting algorithm CPAV. We give a
numerical convergence result, and finally examine some of its properties
through simulations. We also suggest some possible extensions that improve
performance, and allow calculation to be carried out when the direction of the
monotonicity is unknown
Discussion of: Treelets--An adaptive multi-scale basis for sparse unordered data
This is a discussion of paper "Treelets--An adaptive multi-scale basis for
sparse unordered data" [arXiv:0707.0481] by Ann B. Lee, Boaz Nadler and Larry
Wasserman. In this paper the authors defined a new type of dimension reduction
algorithm, namely, the treelet algorithm. The treelet method has the merit of
being completely data driven, and its decomposition is easier to interpret as
compared to PCR. It is suitable in some certain situations, but it also has its
own limitations. I will discuss both the strength and the weakness of this
method when applied to microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS137E the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Discussion: Latent variable graphical model selection via convex optimization
Discussion of "Latent variable graphical model selection via convex
optimization" by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky
[arXiv:1008.1290].Comment: Published in at http://dx.doi.org/10.1214/12-AOS980 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …